New Audio Analysis Method with Application for Synthesis, Editing, and Compression

Tech ID: 20002 / UC Case 2006-177-0

Background

Sound analysis and synthesis is employed in many applications, including speed recognition, speech synthesis, sound editing and transformation, active noise reduction systems, modeling of general acoustical sources such as musical instruments and voices, sound compression and data storage. Known methods and systems decompose sound into a collection of sinusoids with varying amplitudes, frequencies, and in some cases, phases. Decomposition is typically accomplished with Fourier analysis over short time frames, followed by peak detection, interpolation, and tracking of partials of sequences of fast Fourier transform (FFT) vectors. To avoid noise and imperfections introduced by the limitations of the FFT characterization, additional steps are often applied to characterize the noisy part of the audio signal. There are recognized tradeoffs for accuracy, computational complexity, compression rate, etc. that remain an active subject of research.

Technology Description

A researcher from UC San Diego has developed a new audio coding method that allows efficient decomposition of audio signal into periodic and noise components. The components can be recombined after processing operations, such as compression or editing, to reconstruct a modified version of the audio signal. The sound model can be used also to store and modify clips of sounds for synthesis applications, such as concatenative synthesis of speech or music.
Specifically, this technology provides a robust and efficient sound analysis and synthesis method using high bit rate encoding of an audio signal. An audio signal is decomposed into sinusoidal, modulated sinusoidal and noise components with a comparison of two separate spectral representations. The autoregressive and minimum variance distortion-less responses (MVDR) are calculated from linear prediction coefficients in a time varying manner. The spectral envelope of spectral lines in noise is estimated from selected properties of the spectral representations. A noisality index is derived that assigns different weights to contributions of sinusoidal and noise components at every frequency. The noisality index is used to reduce the order of the AR model and to perform re-synthesis or sinusoidal and noise components.

Advantages

• Robust low-bitrate audio encoding method based on parametric sound model.
• Model is capable of encoding music and/or multiple speakers (contrary to parametric models of speech that assume a single speaker).
• The coding captures both partials and noise in one representation, which is easy to use, encode, or edit.
• Decisions about the periodic and noise parts are done in the decoder, while the encoder remains "light-weight."
• Modular representation offers progressive coding, so that intelligibility and quality can be gracefully compromised.
• Efficient method, implementation utilizes existing optimized hardware and software architectures. 

Publications

Dubnov, S., 2005. “YASAS – Yet another sound analysis-synthesis method”, in Proceedings of International Computer Music Conference (ICMC), New Orleans, 2006. AI Magazine 26(1): 98-102 (2005) 39.

Patent Status

Patent Pending

Contact

Learn About UC TechAlerts - Save Searches and receive new technology matches

Other Information

Keywords

audio codecs, compression

Categorized As